Prof Tony Hey, Science and Technology Facilities Council (STFC)
Marwin Segler, BenevolentAI
Symposium Chair
Rick Stevens, Associate Laboratory Director for Computing, Environment and Life Sciences
Symposium Description
Artificial Intelligence (AI) is poised to have major impacts on many facets of biomedical research in the near future. The recent successes of deep learning in computer vision, natural language processing, game playing and autonomous vehicle development provide hints of what is to come from advances in AI applied to problems in biology and health research. Already major advances have occured through the use of AI methods in drug development, cancer diagnosis, pathology, genome sequence analysis, patient risk analysis and prediction of antibiotic resistance. Today it is becoming routine to build machine learning based predictive models where we have significant amounts of data but lack an underlying theory of mechanism. With careful uncertainty analysis these models can be used in place of traditional mechanistic models in some applications. Generative models can be used to create large collections of synthetic data that models real data, and can be used to generate drug candidates, example DNA sequences and to generate “synthetic patients” for computational analysis. In the future hybrid models that combine traditional first principles mechanistic models with those learned from data will out perform either individual model type.
This session will include speakers developing and using AI accross a range of problems from DNA sequence analysis, to drug design, prediction of phenotypes from genotypes and diagonosis diseases such as cancer, retinal myopathy, lung disease and the interpretion of medical records and to development of treatment strategies.
The brain changes as we age, and these changes are associated with cognitive decline and an increased risk of dementia (Deary et al., 2009). Neuroimaging can measure these age-related changes, and considerable variability in brain ageing patterns is evident (Raz and Rodrigue, 2006). Equally, rates of age-associated decline affect people very differently. This suggests that the measurement of individual differences in age-related changes to brain structure and function may help establish whether someone’s brain is ageing more or less healthily, with concomitant implications for future health outcomes. To do this, research into biomarkers of the brain ageing process is underway (Cole and Franke, 2017), principally using neuroimaging and in particular magnetic resonance imaging (MRI). Full Abstract
11:20
Clint Davis-Taylor
Automated Parameter Tuning for Living Heart Human Model using Machine Leaning and Multiscale Simulations
Living Heart Human Model is a finite element model with realistic three dimensional geometries of four heart chambers, the overall heart responses are driven by sequentially coupled electrical conduction and structural contraction analyses, with blood flow modelled as a closed loop lumped parameter model [1]. It provides a virtual environment to help test medical devices or surgical treatments before they are utilized in human. It is critical to tune the model to a patient or disease state, however this is extremely difficult using traditional varying-one-parameter-a-time approach, as there are a large number of parameters with complex interactions between parameters, and large number of CPU time to perform each analysis. Another popular type of model in cardiovascular research is the Lumped Parameter Network (LPN) model that can approximate the pressure-volume relationship and fluid flow properties. This type of model can be solved in real-time, but it requires some pre-knowledge for the cardiac driving functions, i.e., the time varying pressure and volume relationship of the active chambers [2]. Full Abstract
11:35
David Wright
Combining molecular simulation and machine learning to INSPIRE improved cancer therapy
Cancer is the second leading cause of death in the United States (accounting for nearly 25% of all deaths). Targeted kinase inhibitors play an increasingly prominent role in the treatment of cancer and account for a significant fraction of the $37 billion U.S. market for oncology drugs in the last decade. Unfortunately, the development of resistance limits the amount of time patients derive benefits from their treatment. The INSPIRE project is laying the foundations for the use of molecular simulation and machine learning (ML) to guide precision cancer therapy, in which therapy is tailored to provide maximum benefit to individual patients based on genetic information about their particular cancer. It is vital that such an approach is based on predictive methods as the vast majority of clinically observed mutations are rare, rendering catalog-building alone insufficient. Full Abstract
11:50
Amanda Minnich
Safety, Reproducibility, Performance: Accelerating cancer drug discovery with ML and HPC technologies
The drug discovery process is costly, slow, and failure-prone. It takes an average of 5.5 years to get to the clinical testing stage, and in this time millions of molecules are tested, thousands are made, and most fail. The ATOM Consortium is working to transform the drug discovery process by utilizing machine learning to pretest many molecules in silico for both safety and efficacy, reducing the costly iterative experimental cycles that are traditionally needed. This consortium is comprised of LLNL, GlaxoSmithKline, NCI’s Frederick National Laboratory for Cancer Research, and UCSF. Through ATOM’s unique combination of partners, machine learning experts are able to use LLNL’s supercomputers to develop models based on proprietary and public pharma data for over 2 million compounds. The goal of the consortium is to create a new paradigm of drug discovery that would drastically reduce the time from identified drug target to clinical candidate, and we intend to use oncology as the first exemplar of the platform. Full Abstract
12:05
Fangfang Xia
Deep Medical Image Analysis with Representation Learning and Neuromorphic Computing
Deep learning is increasingly used in medical imaging, improving many steps of the processing chain, from acquisition to segmentation and anomaly detection to outcome prediction. Yet significant challenges remain: (1) Image-based diagnosis depends on the spatial relationships between local patterns, something convolution and pooling often do not capture adequately; (2) data augmentation, the de facto method for learning 3D pose invariance, requires exponentially many points to achieve robust improvement; (3) Labeled medical images are much less abundant than unlabeled ones, especially for heterogenous pathological cases; and (4) Scanning technologies such as magnetic resonance imaging (MRI) can be slow and costly, generally without online learning abilities to focus on regions of clinical interest. To address these challenges, novel algorithmic and hardware approaches are needed for deep learning to reach its full potential in medical imaging. Full Abstract
Artificial intelligence and machine learning (ML) specifically is having an increasing
significant impact on our lives. Since the early wins in computer vision from deep learning (DL) in the 2010’s, deep neural networks have increasingly been applied to hard problems that have defied previous modeling efforts. This is particularly true in chemistry and drug development where there are dozens of efforts to replace the traditional drug development computational pipelines with machine learning based alternatives. In Cancer drug development and predictive oncology there are several cases where DL is beginning to show significant successes. In our work we are applying deep learning to the problem of predicting tumor drug response for both single drugs and drug combinations. We have developed drug response models for cell lines, patient derived xenograft (PDX) models and organdies that are used in preclinical drug development. Due to the limited scale of available PDX data we have focused on transfer learning approaches to generalize response prediction across biological model types. We incorporate uncertainty quantification into our models to enable us to determine the confidence interval of predictions. Our current approaches leverage work on attention, weight sharing between closely related runs for accelerated training and active learning for prioritization of experiments. Our goal is a broad set of models that can be used to screen drugs during early stage drug development as well as predicting tumor response for pre-clinical study design. Results to date include response classifications that achieve >92% balanced classification accuracy on a pan-cancer collection of tumor models and broad collection of drugs. Our work is part of joint program of investment from the NCI and DOE and is supported in part by the US Exascale Computing Project via the CANDLE project. Full Abstract
This talk will review some of the challenges posed by the huge growth of experimental data generated by the new generation of large-scale experiments at UK national facilities at the Rutherford Appleton Laboratory site at Harwell near Oxford. Such ‘Big Scientific Data’ comes from the Diamond Light Source and Electron Microscopy Facilities, the ISIS Neutron and Muon Facility, and the UK’s Central Laser Facility. Increasingly, scientists are now needing to use advanced machine learning and other AI technologies both to automate parts of the data pipeline and also to help find new scientific discoveries in the analysis of their data. For commercially important applications, such as object recognition, natural language processing and automatic translation, Deep Learning has made dramatic breakthroughs. Google’s DeepMind has now also used Deep Learning technology to develop their AlphaFold tool to make predictions for protein folding. Remarkably, they have been able to achieve some spectacular results for this specific scientific problem. Could Deep Learning be similarly transformative for other scientific problems? After a brief review of some initial applications of machine learning at the Rutherford Appleton Laboratory, we focus on challenges and opportunities for AI in advancing pharmaceutical and materials science. Finally, we discuss the importance of developing some realistic machine learning benchmarks using Big Scientific Data coming from a number of different scientific domains. For the computer vision community, it was the ImageNet database that provided researchers with the capability to evaluate algorithms for object detection and image classification at large scale. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) allowed researchers to compare progress in detection across a wider variety of objects and led directly to the present Deep Learning and GPU revolution. We believe that the creation of a credible ‘Scientific Machine Learning’ (SciML) collection of benchmarks could prove useful and significant for the scientific research community. The talk concludes with some initial examples of our ‘SciML’ benchmark suite and a discussion of the research challenges these benchmarks will enable. Full Abstract
Artificial intelligence is underway to transform the society through technologies like self-driving cars. Also, in drug discovery machine learning and artificial intelligence methods has received increased attention. [1] The increased attention is not only due to methodological progress in machine learning and artificial intelligence, but also progress in automation for screening, chemistry, imaging and -omics technologies, which have generated very large datasets suitable for machine learning.
While machine learning has been used for a long time in drug design, there has been two exiting developments during the last years. One is the progress in synthesis prediction, where deep learning together with fast search methods like Monte Carlo Tree Search has been shown to improve synthetic route prediction as exemplified by a recent Nature article. [2] In this talk I will focus on the second development, which is applying deep learning based methods for de novo molecular design. It has always been the dream of the medicinal and computational chemist to be able to search the whole chemical space of estimated 1060 molecules. This would be a step change compared to search enumerable chemical libraries of perhaps 1010 compounds. Methods to search the whole chemical space through generative deep learning architectures has been developed during the last 3-years. In the presentation there will be a focus de novo generation of molecules with the Recurrent Neural Network (RNN) architecture. The basis will be described and exemplified of how molecules are generated. After the concept has been introduced it will be described how the method is used within drug design projects at AstraZeneca. Current limitations will be discussed in conjunction with mitigation strategies to further enhance the potential of RNN based molecular de novo generation. Full Abstract
Due to recent advancements in Deep Learning (DL) algorithms and frameworks, we have started to witness the convergence of High Performance Computing (HPC), Machine
Learning (ML), and various application domains, such as healthcare. This opens the possibility to address the high complexity problems that deal with large data and were considered unsolvable in the past. In this talk we will present several use-cases going from synthetic to real-world problems for medical image classification, segmentation, and generation, using both 2-D and 3-D data. The focus will be on the scale-out behavior and best practices, while also giving details into the bottlenecks encountered in the various use-cases. Jointly working within Intel’s IPCC (Intel Parallel Computing Centers) program, we will present SURFsara’s collaborations with DellEMC, NKI (Netherlands Cancer Institute), and the EXAMODE (www.examode.eu) project consortium. We will demonstrate how large memory HPC systems enable solving medical AI tasks. Full Abstract
11:05
Justin Wozniak
Accelerating Deep Learning Adoption in Biomedicine With the CANDLE Framework
The Cancer Deep Learning Environment (CANDLE) is an open framework for rapid development, prototyping, and scaling deep learning applications on high-performance computing (HPC) systems. CANDLE was initially developed to support a focused set of three pilot applications jointly developed by cancer researchers and deep learning / HPC experts, but is now generalizable to a wide range of use cases. It is designed to ease or automate several aspects of the deep learning applications development process. CANDLE runs on systems from individual laptops to OLCF Summit, the most powerful supercomputer in the world, and enables researchers to scale application workflows to the largest possible scale. Full Abstract
Effective modelling across the genomic scales within a cellular environment plays a crucial role in understanding the principles that govern cell cycle aberration, for instance cancer or disease. The selection of alleles, in conjunction with RNA and protein concentrations, with epigenetic factors; contribute significantly to the cell state and capacity to function. Further to this, sequence-derived features (SDFs) derived from DNA, RNA and protein sequences can contribute useful static information in conjunction with these dynamic processes to improve inference and control for steady-state effects in measurement data. These are commonly applied in transcriptomic studies whereby mRNA level acts as a proxy for protein abundance, as SDFs can be added to the model to improve predictive power. A major limiting factor of many previous studies has been lack of supportive data to coincide expression levels in the analysis of various biological domains. Full Abstract
The explosion of healthcare information stored in Electronic Health Records (EHR) has led to an increasing trend of EHR-based applications in computational biomedicine. Unfortunately, applying deep learning (DL) to medicine is no trivial task as EHR data is extremely complex, usually unbalanced, muddled with missing or invalid values and frequently contains a heterogeneous mixture of data types and structured/unstructured formats. The problem has been compounded by the lack of publicly available datasets that are large enough for the development of deep learning methods as well as by the lack of benchmarking tasks and metrics to compare results. The creation of the MIMIC-III Clinical Database [1] and the recent work of Harutyunyan et al. [2] proposing benchmarking tasks and metrics are accelerating advances in the field. Full Abstract
11:50
Marwin Segler
(invited speaker)</strong
GuacaMol: Benchmarking Models for De Novo Molecular Design
Recently, generative models based on deep neural networks have been proposed to perform de-novo design, that is to directly generate molecules with required property profiles by virtual design-make-test cycles [1,2]. Neural generative models can learn to produce diverse and synthesisable molecules from large datasets, for example by employing recurrent neural networks, which makes them simpler to set up and potentially more powerful than established de novo design approaches relying on hand-coded rules or fragmentation schemes. Full Abstract